We study the use of model-based reinforcement learning methods, in particular, world models for continual reinforcement learning. In continual reinforcement learning, an agent is required to solve one task and then another sequentially while retaining performance and preventing forgetting on past tasks. World models offer a task-agnostic solution: they do not require knowledge of task changes. World models are a straight-forward baseline for continual reinforcement learning for three main reasons. Firstly, forgetting in the world model is prevented by persisting existing experience replay buffers across tasks, experience from previous tasks is replayed for learning the world model. Secondly, they are sample efficient. Thirdly and finally, they offer a task-agnostic exploration strategy through the uncertainty in the trajectories generated by the world model. We show that world models are a simple and effective continual reinforcement learning baseline. We study their effectiveness on Minigrid and Minihack continual reinforcement learning benchmarks and show that it outperforms state of the art task-agnostic continual reinforcement learning methods.
translated by 谷歌翻译
持续学习系统将知识从先前看到的任务转移以最大程度地提高新任务的能力是该领域的重大挑战,从而限制了持续学习解决方案对现实情况的适用性。因此,本研究旨在扩大我们在不断加强学习的特定情况下对转移及其驱动力的理解。我们采用SAC作为基础RL算法和持续的世界作为连续控制任务的套件。我们系统地研究SAC(演员和评论家,勘探和数据)的不同组成部分如何影响转移功效,并提供有关各种建模选项的建议。在最近的连续世界基准中评估了最佳的选择,即称为clonex-sac。 Clonex-SAC获得了87%的最终成功率,而Packnet的80%是基准中的最佳方法。此外,根据连续世界提供的指标,转移从0.18增至0.54。
translated by 谷歌翻译
复杂的推理问题包含确定良好行动计划所需的计算成本各不相同的状态。利用此属性,我们提出了自适应亚go搜索(ADASUBS),这是一种适应性地调整计划范围的搜索方法。为此,ADASUBS在不同距离上产生了不同的子目标。采用验证机制来迅速滤除无法到达的子目标,从而使人专注于可行的进一步子目标。通过这种方式,ADASUBS受益于计划的效率更长的子目标,以及对较短的计划的良好控制。我们表明,ADASUB在三个复杂的推理任务上大大超过了层次规划算法:Sokoban,The Rubik的Cube和不平等现象证明了基准INT,为INT设定了新的最先进。
translated by 谷歌翻译
深度加强学习(RL)的增长为该领域带来了多种令人兴奋的工具和方法。这种快速扩展使得了解RL工具箱的各个元素之间的相互作用。通过在连续控制环境中进行研究,我们从实证角度接近这项任务。我们提出了对基本性质的多个见解,包括:从相同数据培训的多个演员的平均值提升了性能;现有方法在培训运行,培训时期,培训时期和评估运行不稳定;有效培训不需要常用的添加剂动作噪声;基于后抽样的策略探讨比近似的UCB与加权Bellman备份相结合的探讨;单独加权的Bellman备份不能取代剪辑的双Q学习;批评者的初始化在基于集合的演员批评探索中起着重要作用。作为一个结论,我们展示了现有的工具如何以新颖的方式汇集,产生集合深度确定性政策梯度(ED2)方法,从Openai Gyem Mujoco的连续控制任务产生最先进的结果。从实际方面,ED2在概念上简单,易于编码,并且不需要在现有RL工具箱之外的知识。
translated by 谷歌翻译
多智能体增强学习(Marl)为涉及多个交互代理的问题提供了一个框架。尽管与单智能案例明显相似,但多种子体问题通常仍然努力培训和分析。在这项工作中,我们提出了一种新的策略演员 - 批评算法,它将V-Trace扩展到Marl设置。我们的算法的关键优势是它在多工人设置中的高可扩展性。为此,MA-Trace利用重要的采样作为脱策校正方法,这允许分配计算,没有影响培训质量。此外,我们的算法理论上是接地 - 我们证明了一种保证收敛的定期定理。我们在星际争霸多智能课程中广泛评估算法,是多智能代理算法的标准基准。Ma-Trace在所有任务中实现了高性能,并超过了最先进的结果。
translated by 谷歌翻译
如果复杂信号可以表示为更简单的子部分的组合,通信是组成的。在本文中,理论上,理论上表明需要在训练框架和数据上进行归纳偏差来发展组成通信。此外,我们证明了在信令游戏中自发地出现的构思性,其中代理通过嘈杂的频道进行通信。我们通过实验证实了一系列噪声水平,这取决于模型和数据,确实促进了组成性。最后,我们在最近研究的组成度量:地形相似性,冲突计数和情境独立方面提供了对这一依赖性和报告结果的全面研究。
translated by 谷歌翻译
Audio DeepFakes are artificially generated utterances created using deep learning methods with the main aim to fool the listeners, most of such audio is highly convincing. Their quality is sufficient to pose a serious threat in terms of security and privacy, such as the reliability of news or defamation. To prevent the threats, multiple neural networks-based methods to detect generated speech have been proposed. In this work, we cover the topic of adversarial attacks, which decrease the performance of detectors by adding superficial (difficult to spot by a human) changes to input data. Our contribution contains evaluating the robustness of 3 detection architectures against adversarial attacks in two scenarios (white-box and using transferability mechanism) and enhancing it later by the use of adversarial training performed by our novel adaptive training method.
translated by 谷歌翻译
This short report reviews the current state of the research and methodology on theoretical and practical aspects of Artificial Neural Networks (ANN). It was prepared to gather state-of-the-art knowledge needed to construct complex, hypercomplex and fuzzy neural networks. The report reflects the individual interests of the authors and, by now means, cannot be treated as a comprehensive review of the ANN discipline. Considering the fast development of this field, it is currently impossible to do a detailed review of a considerable number of pages. The report is an outcome of the Project 'The Strategic Research Partnership for the mathematical aspects of complex, hypercomplex and fuzzy neural networks' meeting at the University of Warmia and Mazury in Olsztyn, Poland, organized in September 2022.
translated by 谷歌翻译
This paper presents the Crowd Score, a novel method to assess the funniness of jokes using large language models (LLMs) as AI judges. Our method relies on inducing different personalities into the LLM and aggregating the votes of the AI judges into a single score to rate jokes. We validate the votes using an auditing technique that checks if the explanation for a particular vote is reasonable using the LLM. We tested our methodology on 52 jokes in a crowd of four AI voters with different humour types: affiliative, self-enhancing, aggressive and self-defeating. Our results show that few-shot prompting leads to better results than zero-shot for the voting question. Personality induction showed that aggressive and self-defeating voters are significantly more inclined to find more jokes funny of a set of aggressive/self-defeating jokes than the affiliative and self-enhancing voters. The Crowd Score follows the same trend as human judges by assigning higher scores to jokes that are also considered funnier by human judges. We believe that our methodology could be applied to other creative domains such as story, poetry, slogans, etc. It could both help the adoption of a flexible and accurate standard approach to compare different work in the CC community under a common metric and by minimizing human participation in assessing creative artefacts, it could accelerate the prototyping of creative artefacts and reduce the cost of hiring human participants to rate creative artefacts.
translated by 谷歌翻译
Recently proposed systems for open-domain question answering (OpenQA) require large amounts of training data to achieve state-of-the-art performance. However, data annotation is known to be time-consuming and therefore expensive to acquire. As a result, the appropriate datasets are available only for a handful of languages (mainly English and Chinese). In this work, we introduce and publicly release PolQA, the first Polish dataset for OpenQA. It consists of 7,000 questions, 87,525 manually labeled evidence passages, and a corpus of over 7,097,322 candidate passages. Each question is classified according to its formulation, type, as well as entity type of the answer. This resource allows us to evaluate the impact of different annotation choices on the performance of the QA system and propose an efficient annotation strategy that increases the passage retrieval performance by 10.55 p.p. while reducing the annotation cost by 82%.
translated by 谷歌翻译